On-Line Cumulative Learning of Hierarchical Sparse n-grams
نویسنده
چکیده
We present a system for on-line, cumulative learning of hierarchical collections of frequent patterns from unsegmented data streams. Such learning is critical for long-lived intelligent agents in complex worlds. Learned patterns enable prediction of unseen data and serve as building blocks for higher-level knowledge representation. We introduce a novel sparse n-gram model that, unlike pruned n-grams, learns on-line by stochastic search for frequent n-tuple patterns. Adding patterns as data arrives complicates probability calculations. We discuss an EM approach to this problem and introduce hierarchical sparse n-grams, a model that uses a better solution based on a new method for combining information across levels. A second new method for combining information from multiple granularities (n-gram widths) enables these models to more effectively search for frequent patterns (an on-line, stochastic analog of pruning in association rule mining). The result is an example of a rare combination—unsupervised, on-line, cumulative, structure learning. Unlike prediction suffix tree (PST) mixtures, the model learns with no size bound but using less space than the data. It does not repeatedly iterate over data (unlike MaxEnt feature construction). It discovers repeated structure on-line and (unlike PSTs) uses this to learn larger patterns. The type of repeated structure is limited (e.g., compared to hierarchical HMMs) but still useful, and these are important first steps towards learning repeated structure in more expressive representations, which has seen little progress especially in unsupervised, on-line contexts.
منابع مشابه
On-Line Learning of Undirected Sparse n-grams
n-grams are simple learning models considered state-of-the-art in many sequential domains. They suffer from an exponential number of parameters in their width, n. We introduce undirected sparse n-grams, which store probability estimates only for some n-tuples from the unconditional joint distribution, specifically the most frequent. Experimental results show this sparse version can outperform a...
متن کاملA Hierarchical n-Grams Extraction Approach for Classification Problem
We are interested in protein classification based on their primary structures. The goal is to automatically classify proteins sequences according to their families. This task goes through the extraction of a set of descriptors that we present to the supervised learning algorithms. There are many types of descriptors used in the literature. The most popular one is the n-gram. It corresponds to a...
متن کاملTraffic Scene Analysis using Hierarchical Sparse Topical Coding
Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...
متن کاملSparse Structured Principal Component Analysis and Model Learning for Classification and Quality Detection of Rice Grains
In scientific and commercial fields associated with modern agriculture, the categorization of different rice types and determination of its quality is very important. Various image processing algorithms are applied in recent years to detect different agricultural products. The problem of rice classification and quality detection in this paper is presented based on model learning concepts includ...
متن کاملLearning a Hierarchical Representation of Words within a Neural Language Model
Neural probabilistic language models have been shown to significantly outperform traditional statistical models based on n-grams (Bengio, Ducharme and Vincent, 2001). However, they are not as popular, in particular due to the high computational resources needed to both train and use them. In this project we explore the idea of learning a hierarchical representation of words in order to obtain a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004